Leveraging output term co-occurrence frequencies and latent associations in predicting medical subject headings
نویسندگان
چکیده
Trained indexers at the National Library of Medicine (NLM) manually tag each biomedical abstract with the most suitable terms from the Medical Subject Headings (MeSH) terminology to be indexed by their PubMed information system. MeSH has over 26,000 terms and indexers look at each article's full text while assigning the terms. Recent automated attempts focused on using the article title and abstract text to identify MeSH terms for the corresponding article. Most of these approaches used supervised machine learning techniques that use already indexed articles and the corresponding MeSH terms. In this paper, we present a new indexing approach that leverages term co-occurrence frequencies and latent term associations computed using MeSH term sets corresponding to a set of nearly 18 million articles already indexed with MeSH terms by indexers at NLM. The main goal of our study is to gauge the potential of output label co-occurrences, latent associations, and relationships extracted from free text in both unsupervised and supervised indexing approaches. In this paper, using a novel and purely unsupervised approach, we achieve a micro-F-score that is comparable to those obtained using supervised machine learning techniques. By incorporating term co-occurrence and latent association features into a supervised learning framework, we also improve over the best results published on two public datasets.
منابع مشابه
Unsupervised Medical Subject Heading Assignment Using Output Label Co-occurrence Statistics and Semantic Predications
Librarians at the National Library of Medicine tag each biomedical abstract to be indexed by their Pubmed information system with terms from the Medical Subject Headings (MeSH) terminology. The MeSH terminology has over 26,000 terms and indexers look at each article's full text to assign a set of most suitable terms for indexing it. Several recent automated attempts focused on using the article...
متن کاملVisualizing Multiple System Atrophy Studies Based on Collaboration Network and Centrality Indices in Web of Science Database
Introduction: Social network analysis is an analytical method based on graph theories that identifies relationships between individuals or factors to analyze the social structures resulted from those relationships. The objective of this study was to analyze co-authorship and co-word networks based on scientometric indicators and centrality measures in the studies on multiple atrophy system dise...
متن کاملVisualizing Multiple System Atrophy Studies Based on Collaboration Network and Centrality Indices in Web of Science Database
Introduction: Social network analysis is an analytical method based on graph theories that identifies relationships between individuals or factors to analyze the social structures resulted from those relationships. The objective of this study was to analyze co-authorship and co-word networks based on scientometric indicators and centrality measures in the studies on multiple atrophy system dise...
متن کاملThe Effect of Weighted Term Frequencies on Probabilistic Latent Semantic Term Relationships
Probabilistic latent semantic analysis (PLSA) is a method of calculating term relationships within a document set using term frequencies. It is well known within the information retrieval community that raw term frequencies contain various biases that affect the precision of the retrieval system. Weighting schemes, such as BM25, have been developed in order to remove such biases and hence impro...
متن کاملLow-Rank Hidden State Embeddings for Viterbi Sequence Labeling
In textual information extraction and other sequence labeling tasks it is now common to use recurrent neural networks (such as LSTM) to form rich embedded representations of long-term input co-occurrence patterns. Representation of output co-occurrence patterns is typically limited to a hand-designed graphical model, such as a linear-chain CRF representing short-term Markov dependencies among s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Data & knowledge engineering
دوره 94 B شماره
صفحات -
تاریخ انتشار 2014